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Abstract 

Recently, it was demonstrated in Q, 111 II that the robustness of the classical Non-Local Means (NLM) 
C*~) , algorithm [1J can be improved by incorporating £ p (0 < p < 2) regression into the NLM framework. This 

| general optimization framework, called Non-Local Patch Regression (NLPR), contains NLM as a special 

CNI ' case. Denoising results on synthetic and natural images show that NLPR consistently performs better 

than NLM beyond a moderate noise level, and significantly so when p is close to zero. An iteratively 
• reweighted least-squares (IRLS) algorithm was proposed for solving the regression problem in NLPR, 

where the NLM output was used to initialize the iterations. Based on exhaustive numerical experiments, 
fNJ i we observe that the IRLS algorithm is globally convergent (for arbitrary initialization) in the convex 

regime 1 < p < 2, and locally convergent (fails very rarely using NLM initialization) in the non-convex 
\ regime < p < 1. In this letter, we adapt the "majorize-minimize" framework introduced in ||8| to 

' explain these observations. 

C/3 1 

Index Terms 

cn : 

. Non-local means, non-local patch regression, £ p minimization, non-convex optimization, iteratively 

, , reweighted least-squares, majorize-minimize, stationary point, relaxation sequence, linear convergence. 

^t- : 
p . 

CO . I. Introduction 

o 

m ■ 

■ In the last decade, some very effective frameworks for image restoration have been proposed that (a) 

£> ' exploit long-distance correlations in natural images, and (b) use patches instead of pixels to robustly 

•i-H ' 

^ \ compare photometric similarities. This includes the classical Non-Local Means (NLM) algorithm HI, 

and the more sophisticated BM3D algorithm El . The latter combines the NLM framework with other 
classical algorithms, and is widely considered as the state-of-the-art in image denoising. We refer the 
reader to ]3]| for a comprehensive review of patch-based algorithms. 

Let u = (ui) be some linear indexing of the input noisy image. In NLM, the restored image u = (Hi) is 
computed using the simple formula 

ik = ^m^i. ( i) 

Here, wtj is some weight (affinity) assigned to pixels i and j, and S(i) is some non-local (sufficiently 
large) neighborhood of pixel i over which the averaging is performed ffTH . In particular, for a given 
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pixel i, let Pj denote the restriction of u to a square window around i. Letting k be the length of this 
window, this associates every pixel i with a point P; in R fc " (the patch space) . The weights in NLM are 
set to be Wij = cxp( — ||Pj — Pj\\ 2 /h 2 ), where ||Pj — Pj|| is the Euclidean distance between P.; and Pj, 
and h is a smoothing parameter. 

Recently, it was demonstrated in Q, IfTUI that the robustness of the Non-Local Means (NLM) algorithm 
HXJ can be improved by incorporating £ p regression into the NLM framework. The idea was to fix some 
< p < 2, and consider the following unconstrained optimization on the patch space: 

Pi = arg mm ]T Wij ||P - Pjf, (2) 

where Wij are the weights used in NLM (one could also use other weights, e.g., see [9]). The denoised 
pixel Hi was then set to be the center pixel in Pj. Note that this reduces to the simple formula in 
J]]) when p = 2. In this case, the optimization is performed pixel-wise. For any other value of p, the 
optimization in (J2]) becomes a generic optimization on the patch space - the regression needs to be 
performed on patches and not just pixels. In particular, when < p < 1, the resulting estimator 
turns out to be more robust to "outliers" in the patch space (compared to standard NLM), and this 
leads to significant improvement in the denoising quality. We refer the reader to HI Oil for an intuitive 
understanding of the robustness in NLPR, and for denoising results on synthetic and natural images. 

Note that we can generally write the optimization problem in © as 

n 

min Wj || x — aj || p , (3) 

where oi, . . . , a n are given points in R rf , and w\ , . . . , w n are some positive weights. Motivated by the 
work on algorithms for £ p minimization in (5J, 0, the authors in H10H proposed to optimize © using 
iteratively reweighted least-squares (IRLS). Fixing some small e > 0, and initializing x^ as the NLM 
output, the update rule was set to be 

(*) 

, t+1) EjM) % 



(t > 0), (4) 

'i ^3 ' 
where 



M f=^(||xW-a J -|| 2 + £ f/ 2 - 1 . (5) 

We refer the reader to H10H for the heuristics behind the IRLS update in (J4]). Extensive numerical 
experiments show us the that the algorithm is globally convergent (for arbitrary a;^ ') in the convex 
regime 1 < p < 2, and locally convergent (fails very rarely compared to, say, the gradient or Newton 
method) in the non-convex regime < p < 1, provided that x^ ' is set as the NLM estimate. The former 
observation can be explained using the existing literature on the Weiszfeld algorithm Q, O, which 
is perhaps less well-known in the signal processing community. In this letter, we adapt the "majorize- 
minimize" framework introduced in (8j to specifically analyze Q when < p < 1. The analysis 
automatically covers the case 1 < p < 2. This is the content of Section [III in particular, we will show 
that the algorithm forces the cost to be non-increasing as the iteration progresses, and that it exhibits 
linear convergence both in the convex and non-convex (when convergence does occur) regime. We 
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note that in J61, the authors have done an analysis of a related (but more complex) IRLS algorithm in 
the non-convex regime, but the analysis is quite involved. The present analysis is much more simple, 
and is based on elementary results from smooth unconstrained optimization. 



II. Convergence Analysis 

The key question is what is the cost associated with the IRLS iterations in Q? This must be resolved 
even before we ask questions about convergence. It turns out that the iterations corresponds to a 
regularized version of the original cost ([3]). This is given by 

where \x\ e is the regularized version of the Euclidean norm, 

\x\l = \\xf+e (e>0). 

Note that $ £ corresponds to the original cost © when e = 0. It can be argued that the minimizers of 
$ £ converge to the minimizer of the original problem as e tends to zero. In other words, for sufficiently 
small s, the minimizer of $ £ is close to that of the original problem. Henceforth, to simplify notation, 
we fix some small e > and denote <f> e by <f>. We note that in H10H . the authors proposed to start with, 
say, e = 1, and then gradually shrink it to zero as the iteration progresses. While this does tend to 
speed up the convergence, the associated analysis becomes quite complicated. 

The advantage we get by considering the regularized problem is that the function <&(x) is smooth 
(infinitely differentiable) . This allows us to use the powerful tools of smooth optimization. Moreover, 
$ inherits the convex nature of the original problem, namely, that it is strictly convex for 1 < p < 2. 
Since $ is smooth, it suffices to show that its Hessian is positive definite. In fact, the gradient (denoted 
by $') and the Hessian (denoted by <£>") are given by 

=P^w j \x-a j \ p - 2 {x-a j ), 

3 

and 

=p^2w J \x - a 3 \ p - i \x - aj\ 2 £ I d - (2 - p)(x - aj)(x - aj) 1 

3 

Here, Id is the identity matrix of size d x d. For any non-zero u e R d , 

m t $"(.t)m = p^wjlx - aj \P- 4 \\x - aj \ 2 £ \\u\\ 2 - (2-p)(u T (x - a 3 )f 

3 

Since u T (x — aj) < \\u\\ ■ \\x — a,j\\ < \\u\\ ■ \x — a.j\ £ , the quadratic form is strictly larger than 

pip-l^uf^Wjlx-aj 
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This is non-negative if and only if 1 < p < 2. Therefore, $ is strictly convex in this case, and has a 
unique global minimizer x* for which <£>'(a;*) = 0. On the other hand, it is not difficult to see that 
$ need not be convex when p < 1. The best we can hope for in this case is that the iterates in © 
converge to some local stationary point. In fact, we can show that 
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Theorem 1: The IRLS update in guarantees the following: 

1) For < p < 2, the sequence ($(a; (t) )) is strictly monotonic, <&(cc (t+1) ) < <3>(a; (t) ) for all t. 

2) When 1 < p < 2, (a;'*)) converges linearly to the unique global minimizer of $. 

3) For < p < 1, under some additional assumptions, the convergence is again linear and the limit 
of (a;(*)) is a stationary point of 

By linear convergence, we mean that the convergence happens at an exponential rate. The relaxation 
property is particularly important for the non-convex setting, providing the guarantee that the cost 
at the end of the iterations is less than that obtained from the initial estimate x^°\ In optimization 
literature, one calls (x^) a relaxation sequence if $(a;^ t+1 ^) < Since $ is trivially bounded below, 

this implies convergence of the sequence (<&(x^)). As we will see, the iterates in Q unconditionally 
generate a relaxation sequence in both the convex and non-convex regime. This property turns out 
to be a central ingredient in the guarantees provided by Theorem [T] We note that the relaxation 
property was recently observed in [9] for the special case p = 1. However, we remark that the fact 
that the convergence of ($(a;0)) is not sufficient to guarantee convergence of (x^), as claimed in 
J9). For example, it is possible that (x^) keeps oscillating, or escapes to infinity while ensuring that 
$(a;( t+1 )) < <i>(a;(')). One of the cases can however be ruled out immediately: 

Proposition 2: For < p < 2, the IRLS iterates (a;''') are bounded (do not escape to infinity). 

This is a simple consequence of the observation that $(cc) — > oo as ||cc|| — > oo. Indeed, if (a;'*)) does 
escape to infinity, then we would have a contradiction since we just showed that ($(a;(*')) is bounded. 

In the convex regime, we will show that oscillations can also be ruled out. To do so in the non-convex 
regime, we will need additional assumptions. 

A. Majorize-Minimize interpretation 

To establish Theorem [TJ we will use the majorize-minimize (MM) framework from ]8) . In this frame- 
work, the key idea is to globally approximate $ from above using a sequence of quadratic functions. 
More precisely, after having found x^\ we construct a function ^(cc) = ^(x;x^) such that 

• < for all x. 

• \l/ t (cc) has aj( t+1 ) in ((4]) as its unique global minimizer. 

Once we have ^ t with the above properties, we immediately see that 

$(cc ( *+ 1) ) < * t (a; (t+1) ) < f t (xW) = $( X M). 
That is, we are guaranteed that (x^) is a relaxation sequence. We now need to specify ^t(x). 
Proposition 3: The following choice will suffice: 

* t (a0 = + ( x - x {t) ) T ®'(x {t) ) + | ][>f ||a - z (t) H 2 , (7) 

j 
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where is as defined in (JS]). 

Note that the linear part of \& t (cc) is simply the linear approximation of $ at x^\ and the quadratic 
form is derived from the dominant part of $"(a;0). By construction, $ t (ajw) = $(a;w). Moreover, it 
is clear that $>t(x) is strictly convex (for all p), and has a global unique minimizer. Setting 2;(* +1 ) to be 
this minimizer, we have 

mx^+V) = <S>'{x^)+pY,^\x {t+1) -* {t) ) =0- (8) 

3 

Substituting for we get the update rule in ©. 

To complete the proof, we need to show that < &t(x). Note that we can write $(cc) — ^t(x) as 

n 

Y^w^-aAl-^-a^l + p^ U ; J |a ; W-a J -|r 2 (^-^ t) ) T (^-a J -) 

+ l^l^-a.r^lx-xWf). 

i 

We substitute the following above: 

(x - x®f(x® - a,) = (x- aj ) T (xV - a,) - \\x® - a 3 \\ 2 , 

and 

\\x x^f = \\x + ||xW - a,f - 2(x - aj f(x^ a,). 

This allows us to simplify the expression to 

71 

where a 3 = |a;W — a 3 -|g and /?_,- = |aj — Ojlg. It can be verified that each term in the sum is non-negative 
for any < p < 2 (for p = 0, 1, and 2 this is obvious). This shows that $(x) < ^(x), concluding the 
proof of 0. 

B. Global and local convergence 

Since the sequence is monotonic and bounded below, it is convergent. In particular, $(a;^) — 

— > o as t — > oo. So, what can we say about the sequence (a;^)? 

Proposition 4: We claim that — a;(* +1 )|| — > as t gets large. 

To do so, we use ([8]), 

and the majorizing property, 

^(x^) < ^(x^) = + (a^ 1 ) - xW) T $'(xW) + ? V ^ ||* (t+1) - ajW || 2 . 

2 ^ — ' 

i 

Combining these, we see that 

$(x(* +1 )) <$(aj(*))-£ y^Wx^-x^f. 

3 
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Now, from (O we have the trivial bound p,j > WjS p ^ 2 1 . We can then write 

|| a .(W-l)_ aj (*)||2< 7 [$( a .(t))_$( a .C*+l))] j 

where 

Since ($(a;(*))) is convergent, we arrive at our claim. 

Note that cannot directly conclude from Proposition [4] that (x^) is convergent. However, since (x^') is 
bounded, it has convergent subsequences (by compactness) . In the convex regime, we can say something 
more: 

Proposition 5: For 1 < p < 2, every convergent subsequence has the same limit, and this limit is the 
unique stationary point of $. In particular, it is necessary that (x^) converges to x* 

We now establish the above claim. Let (x^ m >) be a subsequences that converges to x*. We know that 
|| x (m) _ a.(m+i)|| 0j so that the limit of both {x^) and (x ( ~ m+1 ^) must be x*. Note that 

= ^'{x^+V) = &(x^) +pJ2^ m \x {m+1) - x (m) ). 

3 

Since both &(x) and fij = fij(x) are smooth, letting m — > oo, we have 

= $'(z*) +p^ti; i |x* - a;j\ p - 2 {x* - x*) = 

i 

That is, a;* is a stationary point of $. In the convex regime 1 < p < 2, we know that $ has a unique 
stationary point x*. Therefore, every convergent subsequence of (x^') must have the same limit x*. 

Proposition 6: For < p < 1, the above claim is true only under certain local assumptions. 

The problem in this case is that there can be multiple stationary points of <f>, and the previous argument 
breaks down (as is typical with non-convex problems). All we know in this case is that (a;«) is bounded, 
and that the entire (x^) can be restricted to a ball B r (x*) of radius r around x* . Suppose we assume 
that that the initialization a;^ ^ is "good", in that it is situated sufficiently close to a local (probably 
global) minimizer x* . It is then possible that r is small enough and B r (x*) contains no other stationary 
points of <I>. In this case, we are guaranteed that every convergent subsequence, and hence the whole 
sequence (i"), converges to x*. 

C. Convergence rate 

Plots of log($(a;( t+1 )) — <fr(x*)) versus t for Q suggests a linear trend both for the convex and non- 
convex cases (assuming convergence for the latter). For the convex regime, we can indeed guarantee 
that 

Proposition 7: For 1 < p < 2, 

$( aj (*+ 1 )) - $(sb*) < iy(^(x^) - *(sb*)) (large t, v < 1). 
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In other words, the convergence is exponential, where the convergence rate is controlled by v. To 
establish our claim, we being by comparing $ at the points x(* +1 ) and the linear combination 9 t x^ + 
(1 — 9 t )x* (we will define 9 t later), 

$(a;(* +1 )) < * t (a; (t+1) ) < %(6 t x^ + (1 - t )x*). 

The first inequality follows from majorization, and the second from the optimality of x^ t+1 \ From (0, 
we can write $ t (^xW + (1 — 8 t )x*) as 

<&(x«) + (1 - 9 t )(x* - x«) T <I>'(x«) + P{1 ~ 6t)2 \\x* - x<*>|| 2 5>f • 
On the other hand, 

* t (sc*) = $(x^) + (x* - xW) r $'(»W) + -]|x* - asW|| a V/xf. (9) 

j 

Using this to eliminate the term containing <£'(ccw), we can write ^ t (0 t xW + (1 — 9 t )x*) as 

fltS^W) + (i - *,)(*(**) - ^5>?V - ^H 2 

3 

Now, set 

2(#(x*) - $(x*)) 



?t — 



pllx^xWlpv.^' 



(10) 



Then $(x( t+1 )) < ^ t (x(* +1 ') = t $(xW) + (1 - f )#(x*), and hence 



$(x (t+1) ) - $(x*) < t ($(a: (t) ) - $(x*)). 

By construction, t > for all i. We are done if we can show that 9 t < v < 1 for some constant v. By 
Taylor's theorem, 

$(x*) = $(x«) + (x* - x«) T $'(x«) + i(x* - x (t) ) T $"(yW)(x* - x^), (11) 
where is some point on the segment joining x* and x^K Plugging (O and (II 1 D in (I10D . 



(x*-x( t )) T $"(y( t ))(x^~x( t )) 
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a -, -x-> v ytr ^a: - g / ^ (t) \ - (t) 

p||x* -xm\\ 2 }2jVj j 
where A m j n (^4) denotes the smallest eigenvalue of the matrix A. Now, since $" is continuous, A m i„($"(j/W)) 
approaches A m i n ($"(x*)) as t — > oo. Moreover, x* is a (local) minimizer. Hence, 3>"(x*) is necessarily 
positive semidefinite, that is, A min ($"(x*)) > 0. This is true for < p < 2. Therefore, for all sufficiently 
large t, < 6 t < v < 1, where 

i 

and where //H = Wj(\\x* - a 3 \\ 2 + e) p / 2 - 1 > 0. 

For the convex regime 1 < p < 2, $"(&■*) is guaranteed to be positive definite, so that v < 1. This 
completes the proof of Proposition [Tj 

For the non-convex setting, the above argument holds under additional assumptions: 

Proposition 8: For < p < 1, assume that (xW) converges to the local minimizer x*, and that $"(x*) 
is positive definite. Then converges linearly to $(x*). 
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III. Discussion 

We note that it is rather difficult to extend the analysis in ]9) to the non-convex setting < p < 1. 
Moreover, it is not possible to estimate the convergence rate using the analysis in [9]. What their paper 
shows is that the cost is non-increasing for the case p = 1. One cannot study the behavior of the 
iterates using this result alone. For example, as our analysis shows, while the cost is non-increasing 
when < p < 1, it is not sufficient to guarantee convergence. We also note that the technique of 
proof used in and ]8j are quite different from that used is our paper. The basic idea of "majorize- 
minimize" is the same, but the mathematical ideas used in the present paper are different. Voss, for 
example, uses the sophisticated KKT theory (on polytopes) in his analysis. As against this, our analysis 
uses elementary ideas from smooth unconstrained optimization. Moreover, it is not obvious how the 
analysis in [7] and [8] can be extended to the non-convex regime < p < 1. 

The linear convergence of IRLS is typical of first-order methods. We have also tried second-order Newton 
methods for optimizing ([6]), which are guaranteed to exhibit quadratic convergence (locally). Indeed, 
Newton methods typically require less than half the number of iterations needed by the updates in 
(|4]) to reach a given accuracy. However, the cost of a single Newton step (computation of $" and its 
inversion) is significantly more than that of the simple update in Q. As a result, the total execution 
time of IRLS turns out to be smaller than that of Newton methods. The other point that we noticed 
from the numerical simulations is that IRLS is much more stable than Newton (or gradient descent) 
methods in the non-convex regime. In particular, the iterates in the Newton method often diverge to 
infinity if the initialization is not "close" to a local minimum. However, the IRLS iterates never escape 
to infinity (this is clear from Theorem Q]), and almost always converge to the global optimum when we 
initialize using the NLM output. In rare cases when © gets "stuck" in a local minimum (say, due to 
bad initialization), Newton methods are found to have the same problem. It would be interesting to 
see if one could give a more accurate analysis of the IRLS algorithm in the non-convex regime. 

Finally, we note that the analysis in this letter are about the convergence properties of the NLPR 
algorithm in H10H . and not about its denoising performance. These are two entirely different aspects 
of the algorithm. In H10H . empirical results about the performance of NLPR were reported, but the 
issue of convergence was not studied. We are currently investigating the "optimality" of the denois- 
ing performance of NLPR and possible scope for improvements, which will be reported in a future 
correspondence. 
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